AITopics | validation task

Collaborating Authors

validation task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

Brenning, Alexander, Suesse, Thomas

arXiv.org Machine LearningApr-1-2026

Cross-validation (CV) is commonly used to estimate predictive risk when independent test data are unavailable. Its validity depends on the assumption that validation tasks are sampled from the same distribution as prediction tasks encountered during deployment. In spatial prediction and other settings with structured data, this assumption is frequently violated, leading to biased estimates of deployment risk. We propose Target-Weighted CV (TWCV), an estimator of deployment risk that accounts for discrepancies between validation and deployment task distributions, thus accounting for (1) covariate shift and (2) task-difficulty shift. We characterize prediction tasks by descriptors such as covariates and spatial configuration. TWCV assigns weights to validation losses such that the weighted empirical distribution of validation tasks matches the corresponding distribution over a target domain. The weights are obtained via calibration weighting, yielding an importance-weighted estimator that targets deployment risk. Since TWCV requires adequate coverage of the deployment distribution's support, we combine it with spatially buffered resampling that diversifies the task difficulty distribution. In a simulation study, conventional as well as spatial estimators exhibit substantial bias depending on sampling, whereas buffered TWCV remains approximately unbiased across scenarios. A case study in environmental pollution mapping further confirms that discrepancies between validation and deployment task distributions can affect performance assessment, and that buffered TWCV better reflects the prediction task over the target domain. These results establish task distribution mismatch as a primary source of CV bias in spatial prediction and show that calibration weighting combined with a suitable validation task generator provides a viable approach to estimating predictive risk under dataset shift.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

arXiv.org Machine Learning

2603.29981

Country:

Europe > Germany (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.66)
Law (0.48)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback

STL: Still Tricky Logic (for System Validation, Even When Showing Y our Work) Isabelle Hurley

Neural Information Processing SystemsFeb-18-2026, 08:03:22 GMT

Previous work showed that despite claims of interpretability, humans are unable to use formal specifications presented in a variety of ways to validate even simple robot behaviors.

logic & formal reasoning, machine learning, specification, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.67)

Industry:

Education > Educational Setting (1.00)
Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.90)

Add feedback

ec3183a7f107d1b8dbb90cb3c01ea7d5-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 23:47:44 GMT

agent, algorithm, training task, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
North America > United States (0.04)
North America > Canada (0.04)

Industry: Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ec3183a7f107d1b8dbb90cb3c01ea7d5-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 23:47:33 GMT

algorithm, optimal policy, training task, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplement to Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding 1 Additional Algorithm Details 1.1 Details of the Transformation Function

Neural Information Processing SystemsNov-15-2025, 05:57:22 GMT

The support nodes are either positive or negative. For the transformation function, we stack multiple computation blocks as shown in Figure 1. The stacking mechanism helps the function capture comprehensive relationships between nodes such that the performance is boosted. In each computation block, there are mainly two modules. The detailed architecture of the self-attention module is illustrated in Figure 1.

dataset, node, query node, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

STL: Still Tricky Logic (for System Validation, Even When Showing Y our Work) Isabelle Hurley

Neural Information Processing SystemsOct-10-2025, 18:14:53 GMT

Previous work showed that despite claims of interpretability, humans are unable to use formal specifications presented in a variety of ways to validate even simple robot behaviors.

experiment, specification, trajectory, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.67)

Industry:

Education > Educational Setting (1.00)
Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.90)

Add feedback

Toward PDDL Planning Copilot

Benyamin, Yarin, Mordoch, Argaman, Shperberg, Shahaf S., Stern, Roni

arXiv.org Artificial IntelligenceSep-17-2025

Large Language Models (LLMs) are increasingly being used as autonomous agents capable of performing complicated tasks. However, they lack the ability to perform reliable long-horizon planning on their own. This paper bridges this gap by introducing the Planning Copilot, a chatbot that integrates multiple planning tools and allows users to invoke them through instructions in natural language. The Planning Copilot leverages the Model Context Protocol (MCP), a recently developed standard for connecting LLMs with external tools and systems. This approach allows using any LLM that supports MCP without domain-specific fine-tuning. Our Planning Copilot supports common planning tasks such as checking the syntax of planning problems, selecting an appropriate planner, calling it, validating the plan it generates, and simulating their execution. We empirically evaluate the ability of our Planning Copilot to perform these tasks using three open-source LLMs. The results show that the Planning Copilot highly outperforms using the same LLMs without the planning tools. We also conducted a limited qualitative comparison of our tool against Chat GPT-5, a very recent commercial LLM. Our results shows that our Planning Copilot significantly outperforms GPT-5 despite relying on a much smaller LLM. This suggests dedicated planning tools may be an effective way to enable LLMs to perform planning tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.12987

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Information-theoretic Task Selection for Meta-Reinforcement Learning

Neural Information Processing SystemsAug-17-2025, 03:44:58 GMT

A common framework consists in modeling the range of tasks the agent may encounter as a distribution over all possible tasks.

machine learning, reinforcement learning, training task, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
North America > United States (0.04)
North America > Canada (0.04)

Industry: Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ec3183a7f107d1b8dbb90cb3c01ea7d5-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 03:44:45 GMT

Paper ID 10791Title: Information-Theoretic T ask Selection for Meta-Reinforcement LearningWe thank all the reviewers for their thoughtful feedback. Our response can be found below, organized by review.R1 "It is not yet clear how results on such simple "toy" tasks will, if ever, generalize to practically important task distributions. But this current limitation does and should not stop progress towards such seminal contributions."Thank We agree that scalability to more complex settings is challenging (more on this in response to Reviewer 3), but this is a challenge for all of meta-RL. We introduce a method that identifies a clear gap in the literature, and that provides a first solution to the problem, which performs reliably well in a number of current meta-RL benchmarks.

artificial intelligence, machine learning, training task, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback